Diet code is healthy: simplifying programs for pre-trained models of code

Zhang, Zhaowei; Zhang, Hongyu; Shen, Beijun; Gu, Xiaodong

Title: Diet code is healthy: simplifying programs for pre-trained models of code
Creator: Zhang, Zhaowei; Zhang, Hongyu; Shen, Beijun; Gu, Xiaodong
Relation: ESEC/FSE '22: 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Singapore 14-18 November, 2022) p. 1073-1084
Publisher Link: http://dx.doi.org/10.1145/3540250.3549094
Publisher: Association for Computing Machinery
Resource Type: conference paper
Date: 2022
Description: Pre-trained code representation models such as CodeBERT have demonstrated superior performance in a variety of software engineering tasks, yet they are often heavy in complexity, quadratically with the length of the input sequence. Our empirical analysis of CodeBERT's attention reveals that CodeBERT pays more attention to certain types of tokens and statements such as keywords and data-relevant statements. Based on these findings, we propose DietCode, which aims at lightweight leverage of large pre-trained models for source code. DietCode simplifies the input program of CodeBERT with three strategies, namely, word dropout, frequency filtering, and an attention-based strategy that selects statements and tokens that receive the most attention weights during pre-training. Hence, it gives a substantial reduction in the computational cost without hampering the model performance. Experimental results on two downstream tasks show that DietCode provides comparable results to CodeBERT with 40% less computational cost in fine-tuning and testing.
Subject: program simplification; pre-trained models; learning program representations; code intelligence
Identifier: http://hdl.handle.net/1959.13/1494926
Identifier: uon:53919
Identifier: ISBN:9781450394130
Language: eng
Reviewed

Hits: 3055
Visitors: 3051
Downloads: 0

		Thumbnail	File	Description	Size	Format